表示像素位移的光流量广泛用于许多计算机视觉任务中以提供像素级运动信息。然而,随着卷积神经网络的显着进展,建议最近的最先进的方法直接在特征级别解决问题。由于特征向量的位移不与像素位移不一致,因此常用方法是:将光流向神经网络向前传递到任务数据集上的微调该网络。利用这种方法,他们期望微调网络来产生编码特征级运动信息的张量。在本文中,我们重新思考此事实上的范式并分析了视频对象检测任务中的缺点。为了缓解这些问题,我们提出了一种具有视频对象检测的\ textBF {i} n-network \ textbf {f} eature \ textbf {f} eature \ textbf {f}低估计模块(iff模块)的新型网络(iff-net)。在不借鉴任何其他数据集的预先训练,我们的IFF模块能够直接产生\ textBF {feature flow},表示特征位移。我们的IFF模块由一个浅模块组成,它与检测分支共享该功能。这种紧凑的设计使我们的IFF-Net能够准确地检测对象,同时保持快速推断速度。此外,我们提出了基于\ Textit {自我监督}的转换剩余损失(TRL),这进一步提高了IFF-Net的性能。我们的IFF-Net优于现有方法,并在Imagenet VID上设置最先进的性能。
translated by 谷歌翻译
时间序列形状是最近发现对时间序列聚类有效(TSC)有效的歧视子序列。形状方便地解释簇。因此,TSC的主要挑战是发现高质量的可变长度形状以区分不同的簇。在本文中,我们提出了一种新型的自动编码器窗帘方法(AutoShape),这是第一次利用自动编码器和塑形器以不受欢迎的方式确定形状的研究。自动编码器的专门设计用于学习高质量的形状。更具体地说,为了指导潜在的表示学习,我们采用了最新的自我监督损失来学习不同变量的可变长度塑形塑形(时间序列子序列)的统一嵌入,并提出多样性损失,以选择歧视嵌入的嵌入方式统一空间。我们介绍了重建损失,以在原始时间序列空间中恢复形状,以进行聚类。最后,我们采用Davies Bouldin指数(DBI),将学习过程中的聚类性能告知AutoShape。我们介绍了有关自动赛的广泛实验。为了评估单变量时间序列(UTS)的聚类性能,我们将AutoShape与使用UCR存档数据集的15种代表性方法进行比较。为了研究多元时间序列(MTS)的性能,我们使用5种竞争方法评估了30个UEA档案数据集的AutoShape。结果证明了AutoShape是所有比较的方法中最好的。我们用形状来解释簇,并可以在三个UTS案例研究和一个MTS案例研究中获得有关簇的有趣直觉。
translated by 谷歌翻译
与传统的基于模型的故障检测和分类(FDC)方法相比,深神经网络(DNN)被证明对航空航天传感器FDC问题有效。但是,在训练中消耗的时间是DNN的过度,而FDC神经网络的解释性分析仍然令人难以置信。近年来,已经研究了一个称为基于图像缺陷的智能FDC的概念。这个概念主张将传感器测量数据堆叠到图像格式中,然后将传感器FDC问题转换为堆叠图像上的异常区域检测问题,这很可能很可能借用了机器视觉领域的最新进展。尽管在基于图像缺陷的智能FDC研究中声称有希望的结果,但由于堆叠图像的尺寸较低,使用了小的卷积核和浅DNN层,这阻碍了FDC性能。在本文中,我们首先提出了一种数据增强方法,该方法将堆叠的图像膨胀到更大的尺寸(与机器视觉领域中开发的VGG16网的通讯)。然后,通过直接对VGG16进行微调训练FDC神经网络。为了截断和压缩FDC净大小(因此其运行时间),我们在微调网上进行修剪。还采用了类激活映射(CAM)方法,以解释FDC NET的解释性分析以验证其内部操作。通过数据增强,VGG16的微调以及模型修剪,本文开发的FDC网络声称,在5个飞行条件下(运行时间26 ms),在4架飞机上,FDC精度为98.90%。 CAM结果还验证FDC Net W.R.T.它的内部操作。
translated by 谷歌翻译
在本文中,提出了一种新型的数据驱动方法,称为“增强图像缺陷”,用于飞机空气数据传感器(AD)的故障检测(FD)。典范飞机空气数据传感器的FD问题,开发了基于深神经网络(DNN)的边缘设备上的在线FD方案。首先,将飞机惯性参考单元测量作为等效输入,可扩展到不同的飞机/飞行案件。收集了与6种不同的飞机/飞行条件相关的数据,以在培训/测试数据库中提供多样性(可伸缩性)。然后提出了基于DNN的飞行条件预测的增强图像缺乏。原始数据被重塑为用于卷积操作的灰度图像,并分析并指出了增强的必要性。讨论了不同种类的增强方法,即翻转,重复,瓷砖及其组合,结果表明,在图像矩阵的两个轴上的所有重复操作都会导致DNN的最佳性能。基于GRAD-CAM研究了DNN的可解释性,这提供了更好的理解并进一步巩固DNN的鲁棒性。接下来,DNN型号,具有增强图像缺陷数据的VGG-16将针对移动硬件部署进行了优化。修剪DNN后,具有高精度(略微上升0.27%)的轻质模型(比原始VGG-16小98.79%),并获得了快速速度(时间延迟减少87.54%)。并实施了基于TPE的DNN的超参数优化,并确定了超参数的最佳组合(学习速率0.001,迭代时期600和批次尺寸100的最高精度为0.987)。最后,开发了基于Edge设备Jetson Nano的在线FD部署,并实现了飞机的实时监控。我们认为,这种方法是针对解决其他类似领域的FD问题的启发性。
translated by 谷歌翻译
图表神经架构搜索已在最近成功应用于非欧几里德数据上成功应用的图形神经网络(GNNS)得到了很多关注。但是,探索庞大的搜索空间中的所有可能的GNN架构都太耗时或无法对大图数据进行耗时或不可能。在本文中,我们提出了一个平行的图形架构搜索(GraphPas)图形神经网络的框架。在GraphPas中,我们通过设计基于共享的演进学习来探索搜索空间,可以在不失去准确性的情况下提高搜索效率。此外,架构信息熵是动态采用的突变选择概率,这可以减少空间探索。实验结果表明,GraphPas以效率和准确性同时占据了最先进的模型。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
Graph Neural Networks (GNNs) have shown satisfying performance on various graph learning tasks. To achieve better fitting capability, most GNNs are with a large number of parameters, which makes these GNNs computationally expensive. Therefore, it is difficult to deploy them onto edge devices with scarce computational resources, e.g., mobile phones and wearable smart devices. Knowledge Distillation (KD) is a common solution to compress GNNs, where a light-weighted model (i.e., the student model) is encouraged to mimic the behavior of a computationally expensive GNN (i.e., the teacher GNN model). Nevertheless, most existing GNN-based KD methods lack fairness consideration. As a consequence, the student model usually inherits and even exaggerates the bias from the teacher GNN. To handle such a problem, we take initial steps towards fair knowledge distillation for GNNs. Specifically, we first formulate a novel problem of fair knowledge distillation for GNN-based teacher-student frameworks. Then we propose a principled framework named RELIANT to mitigate the bias exhibited by the student model. Notably, the design of RELIANT is decoupled from any specific teacher and student model structures, and thus can be easily adapted to various GNN-based KD frameworks. We perform extensive experiments on multiple real-world datasets, which corroborates that RELIANT achieves less biased GNN knowledge distillation while maintaining high prediction utility.
translated by 谷歌翻译
This paper focuses on designing efficient models with low parameters and FLOPs for dense predictions. Even though CNN-based lightweight methods have achieved stunning results after years of research, trading-off model accuracy and constrained resources still need further improvements. This work rethinks the essential unity of efficient Inverted Residual Block in MobileNetv2 and effective Transformer in ViT, inductively abstracting a general concept of Meta-Mobile Block, and we argue that the specific instantiation is very important to model performance though sharing the same framework. Motivated by this phenomenon, we deduce a simple yet efficient modern \textbf{I}nverted \textbf{R}esidual \textbf{M}obile \textbf{B}lock (iRMB) for mobile applications, which absorbs CNN-like efficiency to model short-distance dependency and Transformer-like dynamic modeling capability to learn long-distance interactions. Furthermore, we design a ResNet-like 4-phase \textbf{E}fficient \textbf{MO}del (EMO) based only on a series of iRMBs for dense applications. Massive experiments on ImageNet-1K, COCO2017, and ADE20K benchmarks demonstrate the superiority of our EMO over state-of-the-art methods, \eg, our EMO-1M/2M/5M achieve 71.5, 75.1, and 78.4 Top-1 that surpass \textbf{SoTA} CNN-/Transformer-based models, while trading-off the model accuracy and efficiency well.
translated by 谷歌翻译